Average Reward Timed Games
نویسندگان
چکیده
We consider real-time games where the goal consists, for each player, in maximizing the average reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on discrete-time game structures which can be specified using a two-player version of timed automata whose locations are labeled by reward rates. Even though the rewards themselves are zerosum, the games are not, due to the requirement that time must progress along a play of the game. Since we focus on control applications, we define the value of the game to a player to be the maximal average reward per time unit that the player can ensure. We show that, in general, the values to players 1 and 2 do not sum to zero. We provide algorithms for computing the value of the game for either player; the algorithms are based on the relationship between the original, infinite-round game, and a derived game that is played for only finitely many rounds. As memoryless optimal strategies exist for both players in both games, we show that the problem of computing the value of the game is in NP∩coNP.
منابع مشابه
Mean-Payoff Games on Timed Automata
Mean-payoff games on timed automata are played on the infinite weighted graph of configurations of priced timed automata between two players—Player Min and Player Max—by moving a token along the states of the graph to form an infinite run. The goal of Player Min is to minimize the limit average weight of the run, while the goal of the Player Max is the opposite. Brenguier, Cassez, and Raskin re...
متن کاملThresholded Rewards: Acting Optimally in Timed, Zero-Sum Games
In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent’s score; the “true” reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate re...
متن کاملAverage-Time Games on Timed Automata
An average-time game is played on the infinite graph of configurations of a finite timed automaton. The two players, Min and Max, construct an infinite run of the automaton by taking turns to perform a timed transition. Player Min wants to minimise the average time per transition and player Max wants to maximise it. A solution of averagetime games is presented using a reduction to average-price...
متن کامل342 A VERAGE - T IME G AMES player
An average-time game is played on the infinite graph of configurations of a finite timed automaton. The two players, Min and Max, construct an infinite run of the automaton by taking turns to perform a timed transition. Player Min wants to minimize the average time per transition and player Max wants to maximize it. A solution of average-time games is presented using a reduction to average-pric...
متن کاملLearning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games
A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005